.TH "gres.conf" "5" "Slurm Configuration File" "July 2018" "Slurm Configuration File"

.SH "NAME"
gres.conf \- Slurm configuration file for generic resource management.

.SH "DESCRIPTION"
\fBgres.conf\fP is an ASCII file which describes the configuration
of generic resources on each compute node. Each node must contain a
gres.conf file if generic resources are to be scheduled by Slurm.
The file location can be modified at system build time using the
DEFAULT_SLURM_CONF parameter or at execution time by setting the SLURM_CONF
environment variable. The file will always be located in the
same directory as the \fBslurm.conf\fP file. If generic resource counts are
set by the gres plugin function node_config_load(), this file may be optional.
.LP
Parameter names are case insensitive.
Any text following a "#" in the configuration file is treated
as a comment through the end of that line.
Changes to the configuration file take effect upon restart of
Slurm daemons, daemon receipt of the SIGHUP signal, or execution
of the command "scontrol reconfigure" unless otherwise noted.
.LP
The overall configuration parameters available include:

.TP
\fBCount\fR
Number of resources of this type available on this node.
The default value is set to the number of \fBFile\fR values specified (if any),
otherwise the default value is one. A suffix of "K", "M", "G", "T" or "P" may be
used to multiply the number by 1024, 1048576, 1073741824, etc. respectively.

.TP
\fBCores\fR
Specify the first thread CPU index numbers for the specific cores which can
use this resource.
For example, it may be strongly preferable
to use specific cores with specific devices (e.g. on a NUMA
architecture). Multiple cores may be specified using a comma
delimited list or a range may be specified using a "\-" separator
(e.g. "0,1,2,3" or "0\-3").
If the \fBCores\fR configuration option is specified and a job is submitted
with the \fB\-\-gres-flags=enforce\-binding\fR option then only the identified cores
can be allocated with each generic resource; which will tend to improve
performance of jobs, but slow the allocation of resources to them.
If specified and a job is \fInot\fR submitted with the
\fB\-\-gres-flags=enforce\-binding\fR option the identified cores will be
preferred for scheduled with each generic resource.
If \fB\-\-gres-flags=disable\-binding\fR is specified, then any core can be
used with the resources, which also increases the speed of Slurm's
scheduling algorithm but can degrade the application performance.
The \fB\-\-gres-flags=disable\-binding\fR option is currently required to use
more CPUs than are bound to a GRES (i.e. if a GPU is bound to the CPUs on one
socket, but resources on more than one socket are required to run the job).
If any core can be effectively used with the resources, then do not specify the
\fBcores\fR option for improved speed in the Slurm scheduling logic.

\fBNOTE:\fR If your cores contain multiple threads only list the first thread
of each core. The logic is such that it uses core instead of thread scheduling
per GRES. Also note that since Slurm must be able to perform resource
management on heterogeneous clusters having various core ID numbering schemes,
an abstract index will be used instead of the physical core index. That
abstract id may not correspond to your physical core number.
Basically Slurm starts numbering from 0 to n, being 0 the id of the first
processing unit (core or thread if HT is enabled) on the first socket,
first core and maybe first thread, and then continuing sequentially to the
next thread, core, and socket. The numbering generally coincides with the
processing unit logical number (PU L#) seen in lstopo output.

.TP
\fBFile\fR
Fully qualified pathname of the device files associated with a resource. 
The file name parsing logic includes support for simple regular expressions as
shown in the example.
This field is generally required if enforcement of generic resource
allocations is to be supported (i.e. prevents a users from making
use of resources allocated to a different user).
If \fBFile\fR is specified then \fBCount\fR must be either set to the number
of file names specified or not set (the default value is the number of files
specified).
Slurm must track the utilization of each individual device If device file
names are specified, which involves more overhead than just tracking the
device counts.
Use the \fBFile\fR parameter only if the \fBCount\fR is not sufficient for
tracking purposes.
NOTE: If you specify the \fBFile\fR parameter for a resource on some node,
the option must be specified on all nodes and Slurm will track the assignment
of each specific resource on each node. Otherwise Slurm will only track a
count of allocated resources rather than the state of each individual device
file.

.\ .TP
.\ \fBLinks\fR
.\ If multiple devices should be co-scheduled whenever possible, then specify a
.\ link number on both devices that have the same value.
.\ Devices may have multiple links and those numberic values should be
.\ identified in a comma separated list.
.\ A typical use case would be to identify GPUs having LNLink connectivity.
.\ This is an optional value.
.\
.TP
\fBName\fR
Name of the generic resource. Any desired name may be used.
Each generic resource has an optional plugin which can provide
resource\-specific options.
Generic resources that currently include an optional plugin are:
.RS
.TP
\fBgpu\fR
Graphics Processing Unit
.TP
\fBnic\fR
Network Interface Card
.TP
\fBmic\fR
Intel Many Integrated Core (MIC) processor
.RE

.TP
\fBNodeName\fR
An optional NodeName specification can be used to permit one gres.conf file to
be used for all compute nodes in a cluster by specifying the node(s) that each
line should apply to.
The NodeName specification can use a Slurm hostlist specification as shown in
the example below.

.TP
\fBType\fR
An arbitrary string identifying the type of device.
For example, a particular model of GPU.
If \fBType\fR is specified, then \fBCount\fR is limited in size (currently 1024).

.SH "EXAMPLES"
.LP
.br
##################################################################
.br
# Slurm's Generic Resource (GRES) configuration file
.br
##################################################################
.br
# Configure support for our four GPUs
.br
Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
.br
Name=gpu Type=gtx560 File=/dev/nvidia1 COREs=0,1
.br
Name=gpu Type=tesla  File=/dev/nvidia2 COREs=2,3
.br
Name=gpu Type=tesla  File=/dev/nvidia3 COREs=2,3
.br
Name=bandwidth Count=20M

\. .LP
\. .br
\. ##################################################################
\. .br
\. # Slurm's Generic Resource (GRES) configuration file
\. .br
\. ##################################################################
\. .br
\. # Configure support for our four GPUs with NVLink
\. .br
\. Name=gpu Type=tesla File=/dev/nvidia0 COREs=0,1 Links=0
\. .br
\. Name=gpu Type=tesla File=/dev/nvidia1 COREs=0,1 Links=0
\. .br
.\ Name=gpu Type=tesla  File=/dev/nvidia2 COREs=2,3 Links=1
.\ .br
.\ Name=gpu Type=tesla  File=/dev/nvidia3 COREs=2,3 Links=1
\. 
.LP
.br
##################################################################
.br
# Slurm's Generic Resource (GRES) configuration file
.br
# Use a single gres.conf file for all compute nodes
.br
##################################################################
.br
NodeName=tux[0\-15]  Name=gpu File=/dev/nvidia[0\-3]
.br
NodeName=tux[16\-31] Name=gpu File=/dev/nvidia[0\-7]

.SH "COPYING"
Copyright (C) 2010 The Regents of the University of California.
Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
.br
Copyright (C) 2010\-2014 SchedMD LLC.
.LP
This file is part of Slurm, a resource management program.
For details, see <https://slurm.schedmd.com/>.
.LP
Slurm is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.
.LP
Slurm is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
details.

.SH "SEE ALSO"
.LP
\fBslurm.conf\fR(5)
